Recap

These are the first experiments where I use version 2 chemistry. Read 2 is now the barcode and UMI. Read 1 is the cDNA. This was done to avoid reading through the TSO on Ilumina read 1

This sequencing run consists of 3 experiments:

  1. Test the version 2 protocol with magnetic poly-T Dynabeads https://au-mynotebook.labarchives.com/share/Piper_Research_Project/MjAuOHwxNzE3NjUvMTYvVHJlZU5vZGUvNDE3OTUxMTA4N3w1Mi44
  2. Benchmark the version 2 protocol with a mixture of human and mouse cell lines Daniel Brown generated https://au-mynotebook.labarchives.com/share/Piper_Research_Project/MjkuOTAwMDAwMDAwMDAwMDAyfDE3MTc2NS8yMy9UcmVlTm9kZS8yMzM4NDg4MTE1fDc1Ljg5OTk5OTk5OTk5OTk5
  3. Test version 2 protocol with a biological appliation from Zac Moore of Brain Cancer lab https://au-mynotebook.labarchives.com/share/Piper_Research_Project/MzEuMjAwMDAwMDAwMDAwMDAzfDE3MTc2NS8yNC9UcmVlTm9kZS8zMzczNDM0NjE1fDc5LjE5OTk5OTk5OTk5OTk5

Notebook aim

Differential expression testing. Use a spline based analysis from the edgeR manual.
In notebook 3B I verbatim reproduce analysis Zac Moore sent me.

Read SCE and extract col data

This object was generated in 2A_Clustering.
Subset only day 3 timepoint for simplicity and look at varying doses.

sce <- readRDS(here::here(
  "data/TIRE_brain_human/SCEs/", "brainCancer_subset.sce.rds"
))

dge <- scran::convertTo(sce, type="edgeR")

day <- 3
dge <- dge[,dge$samples$Day_Exposure %in% day]

tb <- as_tibble(dge$samples)

Differential expression testing

Have a look at the important metadata.

tb %>% 
  dplyr::count(Drug, Day_Exposure, Dose_M) %>% 
  arrange(Day_Exposure, Dose_M)

Remove genes that are lowly expressed. 12,000 genes are kept

keep <- filterByExpr(dge, group=dge$samples$Drug, min.count=1)
dge <- dge[keep,]
summary(keep)
##    Mode   FALSE    TRUE 
## logical   11851   12401

Correct for composition biases by computing normalization factors with the trimmed mean of M-values method.

dge <- calcNormFactors(dge)

Check the MDS plot

limma::plotMDS(dge, labels=dge$samples$Dose_M)

Perform differential expression testing

Use a cubic regression spline on dose with 3 degrees of freedom.

splines <- ns(dge$samples$Dose_M, df = 3)

design <- model.matrix(~splines, dge$samples)
head(design)
##            (Intercept)    splines1   splines2    splines3
## AAAGACTTCG           1 -0.03452765 0.07280511 -0.03821946
## AACGTCAGCC           1 -0.01100756 0.02317551 -0.01216612
## ACAGTAGCTC           1 -0.16611354 0.34968007  0.81643346
## AGCACGTTTA           1 -0.23455942 0.54199354 -0.28442210
## AGTGACAGCG           1  0.30020461 0.49821607 -0.15593612
## ATCAAGAACG           1 -0.16249002 0.72269905 -0.37463953

The three coefficients do not have any particular meaning. Hypothesis testing would only make sense if the three coefficients are assessed together. The advantage of using a cubic spline curve is that it provides more stable fit at the end points compared to a polynomial.

The spline curve with 3 degrees of freedom has 2 knots where cubic polynomials are splined together. In general, choosing a number of degrees of freedom to be in range of 3-5 is reasonable.

Fit BCV

Compared to the edgeR example on timecourse analysis the biological variation is pretty low.

par(mfrow=c(1,2))

dge <- estimateDisp(dge, design)
summary(dge$trended.dispersion)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01261 0.01562 0.02242 0.02894 0.03809 0.06340
plotBCV(dge)

fit <- glmQLFit(dge, design, robust=TRUE)
summary(fit$var.prior)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9317  1.0120  1.0613  1.0480  1.0896  1.1198
plotQLDisp(fit)

Dose response trend analysis

In a dose response experiment, we are looking for genes that change expression level across doses. Here, the design matrix uses 3 natural spline basis vectors to model smooth changes over concentration, without assuming any particular pattern to the trend.
We test for a trend by conducting F-tests on 3 df for each gene:

fit <- glmQLFTest(fit, coef=2:4)

The topTags function lists the top set of genes with most significant dose effects.

results <- as_tibble(as.data.frame(
  topTags(fit, n=length(fit$genes$Symbol))
  ))
results

The total number of genes with significant (5% FDR) changes at different doses can be examined with decideTests.

There are 681 differentially expressed genes across dose.

summary(decideTests(fit))
##        splines3-splines2-splines1
## NotSig                      11727
## Sig                           674

Note that all three spline coefficients should be tested together in this way. It is not meaningful to replace the F-tests with t-tests for the individual coefficients, and similarly the logFC columns of the top table do not have any interpretable meaning.
The trends should instead be interpreted by way of trend plots, as we show now.
Finally, we visualize the fitted spline curves for the top four genes. We start by computing the observed and fitted log-CPM values for each gene:

top_genes <- head(results$Symbol,6)

logCPM.obs <- as.data.frame(cpm(dge, log=TRUE, prior.count=fit$prior.count))
logCPM.obs$ID <- row.names(logCPM.obs)
logCPM.fit <- as.data.frame(cpm(fit, log=TRUE))
logCPM.fit$ID <- row.names(logCPM.fit)

# Mung the cpm columns into a single tibble
lcm.obs <- pivot_longer(logCPM.obs[top_genes,], cols = -ID, names_to = "sample", values_to = "count") %>% 
  rename(obs_cpm = count)
lcm.fit <- pivot_longer(logCPM.fit[top_genes,], cols = -ID, names_to = "sample", values_to = "count") %>% 
  rename(fit_cpm = count)

lcm.obs$fit_cpm <- lcm.fit$fit_cpm

# Add back the sample metadata
lcm <- left_join(lcm.obs, results[,c("ID", "Symbol")])
lcm <- left_join(lcm, dge$samples[,c("Well_BC", "Dose_M")], by=c("sample" = "Well_BC"))

Log scale

Generate the plot.
It makes more sense to plot x axis in log scale

custom_labels <- c("Vehicle", "100 nM", "1 \u03bcM", "10 \u03bcM", "100 \u03bcM")

ggplot(lcm) +
  geom_point(aes(y = obs_cpm, x=Dose_M), color = "black") +
  geom_smooth(aes(y = fit_cpm, x=Dose_M), method = "loess", se = FALSE, color = "red") +
  xlab("TMZ (M)") + ylab("logCPM") +
  scale_x_continuous(trans = "log10") + annotation_logticks(sides="b") + 
  theme_Publication() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
  facet_wrap(~ ID, scales = "free_y", labeller = labeller(ID = label_value)) +
  theme(strip.text = element_text(face = "bold.italic"))

Linear scale

Generate the plot linear x axis. The low doses are too compressed which is why the loess curve looks strange.

custom_labels <- c("Vehicle", "100 nM", "1 \u03bcM", "10 \u03bcM", "100 \u03bcM")

ggplot(lcm) +
  geom_point(aes(y = obs_cpm, x=Dose_M), color = "black") +
  geom_smooth(aes(y = fit_cpm, x=Dose_M), method = "loess", se = FALSE, color = "red") +
  xlab("TMZ (M)") + ylab("logCPM") +
  theme_Publication() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
  facet_wrap(~ ID, scales = "free_y", labeller = labeller(ID = label_value)) +
  facet_wrap(~ ID, scales = "free_y", labeller = labeller(ID = label_value)) +
  theme(strip.text = element_text(face = "bold.italic"))

Gene descriptions

  • MDM2 gene encodes a protein that negatively regulates the p53 tumor suppressor. MDM2 binds to p53, inhibiting its transcriptional activity and promoting its degradation via the ubiquitin-proteasome pathway. This regulation is crucial for controlling cell cycle progression and apoptosis.
  • HMGCS1 gene encodes 3-hydroxy-3-methylglutaryl-CoA synthase 1, an enzyme crucial for the mevalonate pathway, which is key in cholesterol biosynthesis and the production of other isoprenoids.
    • HMGCS1 catalyzes the first step in this pathway, combining acetyl-CoA and acetoacetyl-CoA to form HMG-CoA. This gene’s expression is regulated by various factors, including sterol levels.
    • Alterations in HMGCS1 can impact lipid metabolism and are associated with metabolic disorders and diseases such as cancer.
  • DDIT3 gene encodes a transcription factor known as CHOP (C/EBP homologous protein) or GADD153. This protein is involved in cellular stress responses, particularly endoplasmic reticulum (ER) stress.
    • DDIT3 promotes apoptosis by regulating genes associated with cell death and differentiation.
    • Its overexpression is linked to increased apoptosis, contributing to diseases such as diabetes, neurodegenerative disorders, and cancer.
    • DDIT3’s role in stress responses makes it a potential target for therapeutic intervention in related diseases.
  • TXNIP encodes thioredoxin interacting protein (TXNIP), which is a key regulator of cellular redox balance and glucose metabolism.
    • TXNIP binds to and inhibits thioredoxin, a protein involved in reducing oxidative stress. By modulating oxidative stress and glucose uptake, TXNIP plays a crucial role in various physiological and pathological processes, including diabetes, cancer, and cardiovascular diseases.
    • Elevated levels of TXNIP are often associated with increased oxidative stress and apoptosis.

Session info

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Red Hat Enterprise Linux 9.4 (Plow)
## 
## Matrix products: default
## BLAS:   /stornext/System/data/software/rhel/9/base/tools/R/4.4.1/lib64/R/lib/libRblas.so 
## LAPACK: /stornext/System/data/software/rhel/9/base/tools/R/4.4.1/lib64/R/lib/libRlapack.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
## 
## attached base packages:
##  [1] grid      splines   stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] ggthemes_5.1.0              here_1.0.1                 
##  [3] patchwork_1.3.0             scater_1.32.1              
##  [5] edgeR_4.2.2                 limma_3.60.6               
##  [7] biomaRt_2.60.1              scuttle_1.14.0             
##  [9] lubridate_1.9.3             forcats_1.0.0              
## [11] stringr_1.5.1               dplyr_1.1.4                
## [13] purrr_1.0.2                 readr_2.1.5                
## [15] tidyr_1.3.1                 tibble_3.2.1               
## [17] ggplot2_3.5.1               tidyverse_2.0.0            
## [19] SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0
## [21] Biobase_2.64.0              GenomicRanges_1.56.2       
## [23] GenomeInfoDb_1.40.1         IRanges_2.38.1             
## [25] S4Vectors_0.42.1            BiocGenerics_0.50.0        
## [27] MatrixGenerics_1.16.0       matrixStats_1.4.1          
## 
## loaded via a namespace (and not attached):
##   [1] rstudioapi_0.17.0         jsonlite_1.8.9           
##   [3] magrittr_2.0.3            ggbeeswarm_0.7.2         
##   [5] farver_2.1.2              rmarkdown_2.28           
##   [7] zlibbioc_1.50.0           vctrs_0.6.5              
##   [9] memoise_2.0.1             DelayedMatrixStats_1.26.0
##  [11] htmltools_0.5.8.1         S4Arrays_1.4.1           
##  [13] progress_1.2.3            curl_5.2.3               
##  [15] BiocNeighbors_1.22.0      SparseArray_1.4.8        
##  [17] sass_0.4.9                bslib_0.8.0              
##  [19] httr2_1.0.5               cachem_1.1.0             
##  [21] igraph_2.0.3              lifecycle_1.0.4          
##  [23] pkgconfig_2.0.3           rsvd_1.0.5               
##  [25] Matrix_1.7-0              R6_2.5.1                 
##  [27] fastmap_1.2.0             GenomeInfoDbData_1.2.12  
##  [29] digest_0.6.37             colorspace_2.1-1         
##  [31] AnnotationDbi_1.66.0      rprojroot_2.0.4          
##  [33] dqrng_0.4.1               irlba_2.3.5.1            
##  [35] RSQLite_2.3.7             beachmat_2.20.0          
##  [37] labeling_0.4.3            filelock_1.0.3           
##  [39] fansi_1.0.6               timechange_0.3.0         
##  [41] mgcv_1.9-1                httr_1.4.7               
##  [43] abind_1.4-8               compiler_4.4.1           
##  [45] bit64_4.5.2               withr_3.0.1              
##  [47] BiocParallel_1.38.0       viridis_0.6.5            
##  [49] DBI_1.2.3                 highr_0.11               
##  [51] rappdirs_0.3.3            DelayedArray_0.30.1      
##  [53] bluster_1.14.0            tools_4.4.1              
##  [55] vipor_0.4.7               beeswarm_0.4.0           
##  [57] glue_1.8.0                nlme_3.1-164             
##  [59] cluster_2.1.6             generics_0.1.3           
##  [61] gtable_0.3.5              tzdb_0.4.0               
##  [63] hms_1.1.3                 metapod_1.12.0           
##  [65] BiocSingular_1.20.0       ScaledMatrix_1.12.0      
##  [67] xml2_1.3.6                utf8_1.2.4               
##  [69] XVector_0.44.0            ggrepel_0.9.6            
##  [71] pillar_1.9.0              BiocFileCache_2.12.0     
##  [73] lattice_0.22-6            bit_4.5.0                
##  [75] tidyselect_1.2.1          locfit_1.5-9.10          
##  [77] Biostrings_2.72.1         knitr_1.48               
##  [79] gridExtra_2.3             xfun_0.48                
##  [81] statmod_1.5.0             stringi_1.8.4            
##  [83] UCSC.utils_1.0.0          yaml_2.3.10              
##  [85] evaluate_1.0.1            codetools_0.2-20         
##  [87] cli_3.6.3                 munsell_0.5.1            
##  [89] jquerylib_0.1.4           Rcpp_1.0.13              
##  [91] dbplyr_2.5.0              png_0.1-8                
##  [93] parallel_4.4.1            blob_1.2.4               
##  [95] prettyunits_1.2.0         scran_1.32.0             
##  [97] sparseMatrixStats_1.16.0  viridisLite_0.4.2        
##  [99] scales_1.3.0              crayon_1.5.3             
## [101] rlang_1.1.4               KEGGREST_1.44.1